Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Topi, x86] Using MKL blas for quantized dense #6115

Merged
merged 5 commits into from
Jul 28, 2020

Conversation

anijain2305
Copy link
Contributor

@anijain2305 anijain2305 commented Jul 22, 2020

Using MKL for quantized dense, following the MKL fallback for FP32 dense.

On C5.12x large cascade lake with VNNI support, results for BERT base are as follows (latency in ms)

Type Sequence length MXNet+MKLDNN TVM Alone TVM+MKL
FP32 128 33.56 N/A 16.83
Quantized 128 23.94697 77.16 11.36

@icemelon9 @eric-haibin-lin @shoubhik

TVM Alone has bad performance because we don't have a good integer dense schedule.

Copy link
Member

@eric-haibin-lin eric-haibin-lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the MXNet+MKLDNN baseline also in int8?

@anijain2305
Copy link
Contributor Author

@eric-haibin-lin Yes, the MXNet+MKLDNN baseline is also in int8.

@TaoLv
Copy link
Member

TaoLv commented Jul 23, 2020

Better to show the performance of TVM before using MKL s8u8s32 GEMM.

@anijain2305
Copy link
Contributor Author

@TaoLv Good point, I added the latency numbers for TVM alone. Thanks for pointing it out!

@anijain2305
Copy link
Contributor Author

@icemelon9 Can you please manage this PR?

@anijain2305
Copy link
Contributor Author

Ping @icemelon9

@tqchen
Copy link
Member

tqchen commented Jul 28, 2020

While it is OK to make use of the mkldnn in this case, we should always work hard to get good integer schedules and learn from the insights, just as the case we did for the CUDA softmax and other cases.

@anijain2305
Copy link
Contributor Author

@tqchen Agreed. For now, my reasoning was to just extend MKL to int8. But I agree that it will be better to focus on TVM schedules. This one will require more work as even FP32 schedules for dense are not well optimized.

Copy link
Member

@icemelon icemelon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@icemelon icemelon merged commit 8cd53e0 into apache:master Jul 28, 2020
@icemelon
Copy link
Member

trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Aug 26, 2020
* [Topi, x86] Using MKL blas for quantized dense

* Typo

* CBLAS_OFFSET only available for MKL

* Skipping tests as GPU CI uses Openblas

* Retrigger

Co-authored-by: Ubuntu <ubuntu@ip-172-31-0-202.us-west-2.compute.internal>
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Aug 26, 2020
* [Topi, x86] Using MKL blas for quantized dense

* Typo

* CBLAS_OFFSET only available for MKL

* Skipping tests as GPU CI uses Openblas

* Retrigger

Co-authored-by: Ubuntu <ubuntu@ip-172-31-0-202.us-west-2.compute.internal>
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Aug 26, 2020
* [Topi, x86] Using MKL blas for quantized dense

* Typo

* CBLAS_OFFSET only available for MKL

* Skipping tests as GPU CI uses Openblas

* Retrigger

Co-authored-by: Ubuntu <ubuntu@ip-172-31-0-202.us-west-2.compute.internal>
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Sep 2, 2020
* [Topi, x86] Using MKL blas for quantized dense

* Typo

* CBLAS_OFFSET only available for MKL

* Skipping tests as GPU CI uses Openblas

* Retrigger

Co-authored-by: Ubuntu <ubuntu@ip-172-31-0-202.us-west-2.compute.internal>
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Sep 3, 2020
* [Topi, x86] Using MKL blas for quantized dense

* Typo

* CBLAS_OFFSET only available for MKL

* Skipping tests as GPU CI uses Openblas

* Retrigger

Co-authored-by: Ubuntu <ubuntu@ip-172-31-0-202.us-west-2.compute.internal>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants